Overview

Dataset Statistics

Number of Variables 10
Number of Rows 3276
Missing Cells 1434
Missing Cells (%) 4.4%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 256.1 KB
Average Row Size in Memory 80.0 B
Variable Types
  • Numerical: 9
  • Categorical: 1

Dataset Insights

ph has 491 (14.99%) missing values Missing
Sulfate has 781 (23.84%) missing values Missing
Trihalomethanes has 162 (4.95%) missing values Missing
Potability has constant length 1 Constant Length

Variables


ph

numerical

Approximate Distinct Count 2785
Approximate Unique (%) 100.0%
Missing 491
Missing (%) 15.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 44560
Mean 7.0808
Minimum 0
Maximum 14
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ph is skewed right (γ1 = 0.0256)

Quantile Statistics

Minimum 0
5-th Percentile 4.488
Q1 6.0931
Median 7.0368
Q3 8.0621
95-th Percentile 9.7898
Maximum 14
Range 14
IQR 1.969

Descriptive Statistics

Mean 7.0808
Standard Deviation 1.5943
Variance 2.5419
Sum 19720.0127
Skewness 0.02562
Kurtosis 0.7169
Coefficient of Variation 0.2252
  • ph is not normally distributed (p-value 0.00804009537965166)
  • ph has 46 outliers

Hardness

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 196.3695
Minimum 47.432
Maximum 323.124
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Hardness is skewed left (γ1 = -0.0393)

Quantile Statistics

Minimum 47.432
5-th Percentile 141.7633
Q1 176.8505
Median 196.9676
Q3 216.6675
95-th Percentile 249.6098
Maximum 323.124
Range 275.692
IQR 39.8169

Descriptive Statistics

Mean 196.3695
Standard Deviation 32.8798
Variance 1081.0787
Sum 643306.469
Skewness -0.03932
Kurtosis 0.613
Coefficient of Variation 0.1674
  • Hardness is not normally distributed (p-value 0.009985038018096472)
  • Hardness has 83 outliers

Solids

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 22014.0925
Minimum 320.9426
Maximum 61227.196
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Solids is skewed right (γ1 = 0.6213)

Quantile Statistics

Minimum 320.9426
5-th Percentile 9545.8126
Q1 15666.6903
Median 20927.8336
Q3 27332.7621
95-th Percentile 38474.9902
Maximum 61227.196
Range 60906.2534
IQR 11666.0718

Descriptive Statistics

Mean 22014.0925
Standard Deviation 8768.5708
Variance 7.6888e+07
Sum 7.2118e+07
Skewness 0.6213
Kurtosis 0.4403
Coefficient of Variation 0.3983
  • Solids is not normally distributed (p-value 0.0028870227076398043)
  • Solids has 47 outliers

Chloramines

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 7.1223
Minimum 0.352
Maximum 13.127
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Chloramines is skewed left (γ1 = -0.0121)

Quantile Statistics

Minimum 0.352
5-th Percentile 4.5031
Q1 6.1274
Median 7.1303
Q3 8.1149
95-th Percentile 9.7531
Maximum 13.127
Range 12.775
IQR 1.9875

Descriptive Statistics

Mean 7.1223
Standard Deviation 1.5831
Variance 2.5062
Sum 23332.5788
Skewness -0.01209
Kurtosis 0.5872
Coefficient of Variation 0.2223
  • Chloramines has 61 outliers

Sulfate

numerical

Approximate Distinct Count 2495
Approximate Unique (%) 100.0%
Missing 781
Missing (%) 23.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 39920
Mean 333.7758
Minimum 129
Maximum 481.0306
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Sulfate is skewed left (γ1 = -0.0359)

Quantile Statistics

Minimum 129
5-th Percentile 266.6162
Q1 307.6995
Median 333.0735
Q3 359.9502
95-th Percentile 403.0702
Maximum 481.0306
Range 352.0306
IQR 52.2507

Descriptive Statistics

Mean 333.7758
Standard Deviation 41.4168
Variance 1715.3547
Sum 832770.5626
Skewness -0.03593
Kurtosis 0.6446
Coefficient of Variation 0.1241
  • Sulfate has 41 outliers

Conductivity

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 426.2051
Minimum 181.4838
Maximum 753.3426
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Conductivity is skewed right (γ1 = 0.2644)

Quantile Statistics

Minimum 181.4838
5-th Percentile 300.1095
Q1 365.7344
Median 421.885
Q3 481.7923
95-th Percentile 566.3493
Maximum 753.3426
Range 571.8589
IQR 116.0579

Descriptive Statistics

Mean 426.2051
Standard Deviation 80.8241
Variance 6532.5293
Sum 1.3962e+06
Skewness 0.2644
Kurtosis -0.2785
Coefficient of Variation 0.1896
  • Conductivity is not normally distributed (p-value 3.972806177071302e-06)
  • Conductivity has 11 outliers

Organic_carbon

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 14.285
Minimum 2.2
Maximum 28.3
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Organic_carbon is skewed right (γ1 = 0.0255)

Quantile Statistics

Minimum 2.2
5-th Percentile 8.8154
Q1 12.0658
Median 14.2183
Q3 16.5577
95-th Percentile 19.6373
Maximum 28.3
Range 26.1
IQR 4.4919

Descriptive Statistics

Mean 14.285
Standard Deviation 3.3082
Variance 10.9439
Sum 46797.5625
Skewness 0.02552
Kurtosis 0.04251
Coefficient of Variation 0.2316
  • Organic_carbon has 25 outliers

Trihalomethanes

numerical

Approximate Distinct Count 3114
Approximate Unique (%) 100.0%
Missing 162
Missing (%) 4.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 49824
Mean 66.3963
Minimum 0.738
Maximum 124
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Trihalomethanes is skewed left (γ1 = -0.083)

Quantile Statistics

Minimum 0.738
5-th Percentile 39.5529
Q1 55.8445
Median 66.6225
Q3 77.3375
95-th Percentile 92.1241
Maximum 124
Range 123.262
IQR 21.4929

Descriptive Statistics

Mean 66.3963
Standard Deviation 16.175
Variance 261.6309
Sum 206758.0562
Skewness -0.08299
Kurtosis 0.2363
Coefficient of Variation 0.2436
  • Trihalomethanes has 33 outliers

Turbidity

numerical

Approximate Distinct Count 3276
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 52416
Mean 3.9668
Minimum 1.45
Maximum 6.739
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Turbidity is skewed left (γ1 = -0.0078)

Quantile Statistics

Minimum 1.45
5-th Percentile 2.6843
Q1 3.4397
Median 3.955
Q3 4.5003
95-th Percentile 5.2209
Maximum 6.739
Range 5.289
IQR 1.0606

Descriptive Statistics

Mean 3.9668
Standard Deviation 0.7804
Variance 0.609
Sum 12995.1915
Skewness -0.007813
Kurtosis -0.06454
Coefficient of Variation 0.1967
  • Turbidity is not normally distributed (p-value 0.0022451895219038603)
  • Turbidity has 19 outliers

Potability

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 216216
  • The largest value (0) is over 1.56 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3276
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.56 times larger than the second largest value (1)
  • Potability has words of constant length

Interactions

Correlations

Missing Values